Database Support for Uncertain Data
نویسندگان
چکیده
Singh, Sarvjeet Ph.D., Purdue University, May 2009. Database Support for Uncertain Data. Major Professor: Sunil Prabhakar. In recent years, the field of uncertainty management in databases has received considerable interest due to the presence of numerous applications that handle probabilistic data. In this dissertation, we identify and solve important issues for managing uncertain data natively at the database level. We propose the semantics of join operation in the presence of attribute uncertainty and present various pruning techniques to significantly improve the join performance. Two index structures for indexing categorical uncertain data are also presented. For optimization of probabilistic queries, we discuss novel selectivity estimation techniques. We also introduce a new model for handling arbitrary pdf (both discrete and continuous) attributes natively at the database level. This model is consistent with Possible Worlds Semantics and is closed under the fundamental relation operations of selection, projection and join. We also present and discuss the implementation of Orion – a relational database with native support for uncertain data. Orion is developed as an extension of the open source relational database, PostgreSQL. The experiments performed in Orion show the effectiveness and efficiency of our approach.
منابع مشابه
A New Approach for Knowledge Based Systems Reduction using Rough Sets Theory (RESEARCH NOTE)
Problem of knowledge analysis for decision support system is the most difficult task of information systems. This paper presents a new approach based on notions of mathematical theory of Rough Sets to solve this problem. Using these concepts a systematic approach has been developed to reduce the size of decision database and extract reduced rules set from vague and uncertain data. The method ha...
متن کاملORION: Managing Uncertain (Sensor) Data
An important quality of sensor data is that it is often uncertain or imprecise. This uncertainty can be an inherent aspect of the data (e.g. due to known errors in the measuring device, such as the Gaussian error in GPS readings), or it may be introduced in order to achieve scalability [2, 1], or to ensure a certain level of privacy [4]. Existing database management systems provide virtually no...
متن کاملBayesStore: managing large, uncertain data repositories with probabilistic graphical models
Several real-world applications need to effectively manage and reason about large amounts of data that are inherently uncertain. For instance, pervasive computing applications must constantly reason about volumes of noisy sensory readings for a variety of reasons, including motion prediction and human behavior modeling. Such probabilistic data analyses require sophisticated machine-learning too...
متن کاملTowards Data Abstraction in NetworkedInformation Retrieval
Networked information retrieval aims at the interoperability of heterogeneous information retrieval (IR) systems. In this paper, we show how diier-ences concerning search operators and database schemas can be handled by applying data abstraction concepts in combination with uncertain inference. Diierent data types with vague predicates are required to allow for queries referring to arbitrary at...
متن کاملSemantics Representation of Probabilistic Data by Using Topk-Queries for Uncertain Data
Database systems for uncertain and probabilistic data promise to have many applications. Query processing on uncertain data occurs in the contexts of data warehousing, data integration, and of processing data extracted from the Web. Data cleaning can be fruitfully approached as a problem of reducing uncertainty in data and requires the management and processing of large amounts of uncertain dat...
متن کامل